System for Estimating Download Speed From Passive Measurements

ABSTRACT

A system for passive estimation of throughput in an electronic network is disclosed. The system may include an plurality of mobile devices configured to operate in the network and may further include an electronic data processor. The processor may be configured to access flow records for data flows associated with the mobile devices during a predetermined time interval. Additionally, the processor may be configured to annotate the flow records with an application field and a content provider field. The processor may also be configured to determine a flow type of each data flow based on the application field and the content provider field of the flow records. Furthermore, the processor may be configured to generate a throughput index that only includes non-rate-limited flow types. Moreover, the processor may be configured to estimate maximum throughput for each data flow having non-rate-limited flow types in the throughput index.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.12/963,326, filed Dec. 8, 2010, now U.S. Pat. No. 8,462,625, which ishereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present application relates to throughput estimation techniques and,more particularly, to a system for estimating download speed frompassive measurements.

BACKGROUND

An achievable throughput at which users may download or access differenttypes of content at various locations and times is a very importantmetric to service providers. Being privy to such knowledge enables theservices providers to more effectively provision additional capacity ina particular region of a network of the service provider and/or atparticular times in the network. Currently, a variety of differentmethods and systems exist for measuring download rates and/or throughputin a network. For example, current techniques for measuring throughputinvolve periodically downloading large files from a number of activeprobes while measuring their achieved throughput. However, such a testplaces substantial loads on the network being examined, may notnecessarily represent the actual experiences that users undergo, and areoften expensive to deploy and maintain. Accordingly, such active testsoften are not representative of a portion of a network, and inparticular, a wireless network.

SUMMARY

A system for passive estimation of throughput in a network is disclosed.The system may be configured to analyze data flows associated with oneor more devices operable in a network. In particular, the system may beconfigured to collect and examine flow records for the data flows andannotate the flow records with application and content provider fields.The system may then be configured to determine a flow type of each dataflow based on the application field and the content provider field ofthe flow record. After the flow types have been determined for the dataflows, the system may generate a throughput index which may includenon-rate-limited flow types. The system may then provide throughputestimates for the data flows having non-rate-limited flow types in thethroughput index.

In one embodiment, the system may include an electronic data processorwhich may be configured to access a flow record for each data flow of aplurality of data flows during a predetermined time interval. Theplurality of data flows may be associated with a plurality of computingdevices. The electronic data processor may also be configured toannotate the flow record for each data flow with an application fieldand a content provider field. The application field may indicate anapplication protocol, and the content provider field may indicate acontent provider with which each data flow is in communication.Additionally, the electronic data processor may be configured todetermine a flow type of each data flow based on the application fieldand the content provider field of the flow record. Furthermore, theelectronic data processor may be configured to generate a throughputindex, which includes the flow type of each data flow only if the flowtype is determined to be a non-rate-limited flow type. Once thethroughput index is generated, the electronic data processor may beconfigured to estimate an average maximum throughput for each data flowhaving the non-rate-limited flow type in the throughput index.

In another embodiment, a method for passive estimation of throughput ina network may be provided. The method may include collecting a flowrecord for each data flow of a plurality of data flows during apredetermined time interval. The plurality of data flows may beassociated with computing devices in the network. The method may alsoinclude annotating the flow record for each data flow with anapplication field and a content provider field. The application fieldmay indicate an application protocol, and the content provider field mayindicate a content provider with which each data flow is incommunication. Additionally, the method may include determining a flowtype of each data flow based on the application field and the contentprovider field of the flow record. Flow types may include, but are notlimited to including, a rate-capped flow type, a partially rate-limitedflow type, and a non-rate-limited flow type. The method may also includegenerating a throughput index. The throughput index may include the flowtype of each data flow if the flow type is determined to be thenon-rate-limited flow type. Furthermore, the method may includeselecting each data flow having the flow type in the throughput indexand estimating an average maximum throughput for each data flowselected.

According to another exemplary embodiment, a computer-readable mediumcomprising instructions for defending against internet-based attacks maybe provided. The computer instructions when loaded and executed by anelectronic processor, may cause the electronic processor to performactivities including the following: annotating a flow record for eachdata flow of a plurality of data flows with an application field an acontent provider field, wherein the application field indicates anapplication protocol and the content provider field indicates a contentprovider each data flow is communicating with, and wherein the pluralityof data flows are associated with computing devices in a network;determining a flow type of each data flow based on the application fieldand the content provider field of the flow record; generating athroughput index, wherein the throughput index comprises the flow typeof each data flow only if the flow type is determined to be anon-rate-limited flow type; selecting each data flow having thenon-rate-limited flow type in the throughput index; and estimating anaverage maximum throughput for each data flow selected.

These and other features of the passive measurement system are describedin the following detailed description, drawings, and appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic view of a system providing passive estimation ofthroughput in a network according to an embodiment of the presentinvention.

FIG. 2 is a graph illustrating median normalized throughput ofnon-rate-limited data flow records versus data flow size.

FIG. 3 is a line graph featuring a distribution of measured throughputvalues over flows greater than or equal to one megabyte for several flowtypes.

FIG. 4 is a histogram of the 95^(th) percentile of throughput from eachapplication, content-provider, and application/content provider flowtype.

FIG. 5 is a histogram which illustrates maximum slope ratio of eachapplication/content provider flow type.

FIG. 6 is a table illustrating flow types, the percentage of onemegabyte plus flows of each flow type, and the normalized median andmean throughputs of their one megabyte plus flows.

FIG. 7 is a table illustrating the percent of flows andapplication/content provider types that are rate-capped, partiallyrate-limited, and both rate-capped and partially rate-limited, alongwith the percent of flow types utilized in a throughput index.

FIG. 8 is a scatter plot illustrating a comparison of active and passiveestimates for a region.

FIG. 9 is a histogram illustrating the relative difference of passiveestimates to active estimates in multiple regions.

FIG. 10 is a histogram illustrating the correlation of passive estimatesto active throughput estimates in multiple regions.

FIG. 11 depicts a passive measurement exemplary method for passiveestimation of throughput in a network according to the system.

DETAILED DESCRIPTION OF THE INVENTION

The exemplary embodiments of the present disclosure are described withrespect to systems and methods for estimation of throughput in anetwork. The system may be utilized to effectively estimate throughputin a network by utilizing passive measurements rather than using activemeasuring utilities. The system may be configured to examine data flowsassociated with one or more devices in a communications network. Also,the system may be configured to access flow records for the data flowsand flag or mark the flow records with application and content providerfields. The contents of the application field and the content providerfield of the flow record may be utilized by the system to determine aflow type of each data flow. Once the flow types have been determinedfor the each of data flows, the system may construct a throughput indexthat may include flow types of each data flow that are determined tohave non-rate-limited or non-rate-capped flow types. Accordingly, thesystem may then provide throughput estimates for the data flows havingnon-rate-limited or non-rate-capped flow types in the throughput index.The exemplary embodiments can be applied to other types of systems andmethods.

Referring to the drawings and in particular FIG. 1, an exemplary system100 for passive estimation of throughput in a network is schematicallyillustrated. Maximum throughput may be defined as the achievablethroughput of a stead-state flow (such as a TCP flow) at a given timeand location in the network, however, other definitions are alsocontemplated. The system 100 may include one or more computing devices102. The computing devices 102 may include devices such as, but notlimited to, a computer, an electronic processor, a hand-held device, apersonal digital assistant, a mobile device, a cellular phone, a smartphone, a communications device, a router, a server, and other devices.For example, the computing devices 102 may be HSDPA category sixdevices, which may be able to reach 3.6 Mbps in the download direction.In an embodiment according to the present disclosure, the aforementioneddevices may be utilized in conjunction with one another. Additionally,the system 100 may include a communications network 104, which mayinclude, but is not limited to including, a wireless network, anethernet network, a satellite network, a broadband network, a cellularnetwork, a private network, a cable network, an interactive televisionnetwork, the Internet, or any other suitable network. In one embodiment,the communications network 104 is a wireless network, such as a 3Gwireless network.

The system 100 may also include an electronic data processor 106, whichmay be configured to perform various calculations and operations toprovide the passive estimates. The electronic data processor 106 may beincorporated into various types of computing devices such as, but notlimited to, a server, a desktop computer, a laptop computer, a mobiledevice, a personal digital assistant, a hand-held device, a router, aswitch, and/or other types of computing devices. Furthermore, the system100 may include a database 108, which may be configured to store varioustypes of data and information traversing the communications network 104or otherwise. Both the electronic data processor 106 and the database108 may be devices associated with a service provider 110. Theelectronic data processor 106 and the database 108 may be configured tocommunicate with one another, the communications network 104, and thecomputing devices 102. Also, the service provider 110 may control thecommunications network 104 and control the various computing devices'102 access to the communications network 104.

Notably, the system 100 may be configured to estimate maximum throughputby using passively measured flow records. Specifically, the system 100may be configured to collect, examine, or both collect and examine, allgiven flow records, such as TCP flow records, that traversecommunications network 104 during a predetermined time interval andoutput an estimate of the average maximum throughput over thepredetermined time interval when downloading content from anon-rate-limited internet source provider. In operation, the electronicdata processor 106 may be configured to collect a flow record for eachdata flow occurring in the system 100 during a predetermined timeinterval. For example, a flow record may be collected for each flowevery minute or another time interval. The data flows may be flows thatare either intended for the computing devices 102 or flows that aretransmitted from the computing devices 102. Also, the processor 106 mayalso collect the flow records for a certain percentage of users in thecommunications network 104, such as three percent of the users in thecommunications network 104. The flow records may optionally be stored indatabase 108 of the service provider 110. In one embodiment, each dataflow occurring in the communications network 104 may be distinguishedfrom another data flow by a tuple. As an illustration, thedistinguishing tuple may be a standard (ipsrc, ipdst, sport, dport)tuple or other appropriate tuple. Each flow record may be annotated withan application field and a content provider field and the annotation maybe performed by the electronic data processor 106.

The application field may indicate or correlate to an applicationprotocol utilized in the data flow that the flow record is associatedwith. On the other hand, the content provider field may indicate aservice/content provider that the particular data flow is communicatingwith. In one embodiment, the application field may be based onapplication headers and port numbers. In another embodiment, the contentprovider field may be identified by an HTTP Content-Provider header,other header, or a domain name service name of a server associated withthe content provider. In yet another embodiment, the flow record may befurther annotated with additional fields/statistics. For example, theelectronic data processor 106 may annotate the flow record with a bytesfield. The bytes field may be utilized to indicate a volume of data thatis transferred during the predetermined time interval. The electronicdata processor 106 may also annotate the flow record with duration andtotal bytes fields. The duration field may indicate a time intervalbetween the first and last packets for a particular data flow and thetotal bytes field may indicate a volume of data transferred since thedata flow was initiated. In an embodiment, the flow records may beconfigured to include no personally identifying information.

Rather than merely applying a summary function over byte/duration valuesin all flow records (e.g. the mean of the values), the electronic dataprocessor 106 may be configured to analyze and take into account thedata flow size, the application protocol, and the content provider whenproviding the estimate of throughput. With regard to data flow size, theelectronic data processor 106 may be configured to determine whethereach data flow of the data flows occurring in the communications network104 has the minimum flow size required to achieve a steady-statethroughput. As an illustration, often times a significant number ofbytes of a particular data flow may be transferred before achieving asteady-state throughput. Such as scenario may occur when the data flowtransfer is beginning and the data flow initiates in a slow-start phasethat gradually checks for available capacity in the network.Accordingly, the electronic data processor may be configured todetermine a flow size that enables the majority of data flows in thecommunications network 104 to exit a phase such as a slow-start phase.By determining the flow size to exit such a phase and only includingthose data flows having such a flow size, the estimations provided bythe electronic data processor 106 may be more indicative of the maximumthroughput.

FIG. 2 features a graph illustrating median normalized throughput ofnon-rate-limited flow records versus flow size is schematicallyillustrated. In this example, all flow records with size 2 ^(i)≦totalbytes<2 ^(i+1) are aggregated in the bin 2′. Additionally, FIG. 2illustrates that the median measured throughput, in this case somewherebetween 0.5 and 0.6, of non-rate-limited flow records stabilizes atapproximately one megabyte (1MB). The electronic data processor 106 mayutilize the bytes threshold at which the measured throughput stabilizesas a factor in its throughput estimations. For example, the electronicdata processor 106 may exclude all data flows that do not include enoughbytes to achieve stabilized throughput from the estimation calculations.Although the electronic data processor 106 may be configured to executeand perform a summary function over the byte/duration values in all flowrecords that have a total bytes value greater than or equal to thethroughput stabilization threshold (in this case 1MB), such a functionmay not be sufficient since measured throughput of identically sizedlarge flows may still vary based on the application protocol utilizedand the content-provider utilized.

As an illustration, and referring now also to FIG. 3, a line graphfeaturing a distribution of measured throughput values over 1MB plusflows for several (application, content-provider) flow types isschematically illustrated. Specifically, FIG. 3 illustrates graphs forrate-capped flow types, partially rate-limited flow types,non-rate-limited flow types, and a cumulative graph for all of the flowtypes. Flows having rate-capped flow types may be capped at a particularthroughput by a content provider associated with the flows, and may notreach a maximum possible throughput. In particular, rate-capped flowtypes may be flow types that never reach the available capacity of thenetwork. Flows may also appear to be rate-capped based on trafficshaping by the content provider, application protocol bottlenecks,and/or congestion/capacity issues. Non-rate-limited flow types may beflow types that are not capped by a content provider or are nototherwise rate limited. Data flows having a partially-rate-limited flowtype may be rate-limited in some throughput ranges and non-rate-limitedin other throughput ranges. In particular, partially-rate-limited flowtypes may be defined as those flow types having a significant fractionof rate-limited flows. As FIG. 3 illustrates, the rate-capped flow typedepicts a bottleneck or a rate-limitation by the content provider sincenone of the flows reach the higher possible throughputs, as illustratedin the tail of the all flows types line. In contrast, thenon-rate-limited flow type is illustrated in FIG. 3 as having throughputvalues across the possible spectrum of throughputs. Thepartially-rate-limited flow type in FIG. 3 depicts the bimodal nature ofthe flow type. Specifically, the partially rate-limited flow type israte-limited between 0-40% and non-rate-limited between 40-100% ranges.Accordingly, the electronic data processor 106 may be configured toincorporate other factors in performing the estimations.

In order to provide a more accurate estimate of maximum throughput forthe data flows, the electronic data processor 106 may be configured tofilter out applications and content providers that have flowdistributions that are similar to the rate-capped flow types andpartially-rate-limited flow types. In FIG. 3, in order to identifyrate-capped flows, it is noted that the rate-capped flow distributiondoes not cross the tail of the all flows distribution. As an example, ifit may be assumed that at least five percent of all 1MB plus flows reachthe available capacity of the communications network 104, then anon-rate-capped or non-rate-limited flow type may have a 95^(th)percentile throughput at least as large as the 95^(th) percentilethroughput of all 1MB plus data flows since all 1MB plus flows mayinclude both rate-limited and non-rate-limited flow records. Referringnow also to FIG. 4, a histogram of the 95^(th) percentile of throughputfrom each application, content-provider, and application/contentprovider flow type respectively is schematically illustrated. The flowtypes may be defined by application only, content provider only, and asa application/content provider pair.

FIG. 4 illustrates that for the content provider and application/contentprovider combination, there is a mode to the right of the line 402 (the“95^(th) percentile of all 1MB plus flows line). The mode to the rightof the line 402 represents flows having non-rate-limited/non-rate-cappedflow types. The distribution for applications does not have such a mode.This may suggest that rate-capping is done primarily by contentproviders instead of application protocols. Flow types to the left ofthe line 402 may be classified as being rate-capped. As noted above,partially rate-limited flows have a bimodal nature which includes adistribution having rate-limited and non-rate-limited portions. Suchchanges in the distribution may be observed by examining a flow type'scumulative distribution function (CDF) slope. A heuristic as follows maybe utilized: Let s_(i) and s_(i+5) be the slopes at percentile i and i+5respectively. The slope ratio of s_(i) and s_(i+5) may be s_(i+5). Themaximum slope ratio may be defined as the greatest slope ratio over iε[7,8,9 . . . 93] (the top and bottom percentiles may be ignored toguard against outliers). The maximum slope ratio will be large if thereis a dramatic decrease in slope within any five percentile range. Inpractice, s, may be approximated as the difference between percentile(i−2.5) and percentile (i+2.5).

Additionally, FIG. 5 illustrates a histogram which illustrates maximumslope ratio of each (application, content provider) flow type. Themaximum slope ratio may be computed and depicted on the histogram usinga logarithmic scale. FIG. 5 illustrates only flow types having at least100 flow records. A primary mode to the left of the line 502 at maximumslope ratio equals five is shown. This primary mode may represent flowtypes that do not have dramatic changes in slope. However, a long tailis pictured to the right of the line 502. Flow types to the left of thepartially rate-limiting threshold line 502 may be identified aspartially rate-limited. Utilizing a partially rate-limiting thresholdequal to five may capture the majority of flow types in the main mode.

In light of the above, the electronic data processor 106 may beconfigured to determine the flow types of each data flow based on theapplication field and the content provider field of the flow record.Upon determining the flow types of the data flows, the electronic dataprocessor 106 may be configured to generate/construct a throughputindex. The throughput index may be utilized to filter out all flow typeswhich are not non-rate-limited or non-rate-capped flow types. In otherwords, the throughput index may be configured to include only those flowtypes which are non-rate-limited or non-rate-capped. FIG. 6 illustratesa table featuring the top fifteen flow types by number of 1 MB plusflows, whether they are identified as rate capped (C) and/or partiallyrate-limited (L), and their corresponding mean and median throughputs.Entries selected in bold are entered into the throughput index generatedby the electronic data processor 106 because they are eithernon-rate-limited or non-rate-capped flow types. As noted in the table ofFIG. 6, the non-rate-limited or non-rate-capped flow types' mean andmedian throughputs are much closer in value than rate-capped orpartially rate-limited flow types.

Notably, the electronic data processor 106 may determine the flow typesof each data flow based on both the application field and the contentprovider field rather than based on the fields individually, becausesome content providers may have both non-rate-limited and rate-limitedapplications. Such as a scenario is depicted by content providers C2 andC5 of FIG. 6. FIG. 7 illustrates the percentage of flows and flow typesin each flow type category. Specifically, FIG. 7 indicates that nearly60% of large flows are rate-capped and are thus unable to reach maximumthroughput capacity of the communications network 104. Additionally,FIG. 7 indicates that 38.7% of 1MB plus flows and 23.1% of flow types.As noted above, the throughput index may be utilized to filter out allflow types which are not non-rate-limited or non-rate-capped flow types.This enables the electronic data processor 106 to select only those dataflows that are non-rate-limited or non-rate-capped for generating themaximum throughput estimations. In one embodiment, the throughput indexmay be recalculated by the electronic data processor 106 on a set orrandom time interval.

Upon using the throughput index as a filter to filter out theappropriate flows, the electronic data processor 106 may then proceed toestimate maximum throughput. The electronic data processor 106 may beconfigured to aggregate the byte/duration measurements of the flows inthe throughput index. The aggregation may be performed using a pluralityof methods. For example, one method (TI-F) may include taking a meanover the throughputs of all flow records in the throughput index. Theaggregate resulting from this method may be robust to outlier userssince it weights a very large number of flows from different usersequally. This method may also be sensitive to non-network problems aswell. A second method (TI-T) for aggregating the byte/durationmeasurements may include having the electronic data processor 106compute the mean (average) of the means (averages) of each flow type inthe throughput index. This second method weights each flow type equallyso it is more robust to unexpected changes between individual contentproviders, however, it may be more sensitive to unpopular flow typesthat may be used infrequently. Either method, along with other methods,may be utilized by the processor to provide the estimations of maximumthroughput.

In an embodiment, the electronic data processor 106 may be configured tovalidate or evaluate, or both validate and evaluate, the estimations ofmaximum throughput that were based on passively collected flow recordsto a set of active measurements, which may be retrieved from probesplaced along various points in the communications network 104. In anexample, each probe that is placed in the communications network 104 maybe configured to perform a throughput measurement by downloading a filevia an FTP from a server. The active maximum throughput estimate may bethe mean of all measurements from all probes in the region of thecommunications network 104 that the probes are placed in. The passivemaximum throughput measurements may be then compared to the activethroughput measurements for a time interval during similar time periods.FIG. 8 illustrates a scatter plot illustrating such a comparison betweenactive and passive throughput estimates. Each point may represent theestimate for one hour in the largest region of the communicationsnetwork 104. When the passive and active estimates have the same value,then the corresponding points fall on the x=y line 802. Upon furtherinspection of FIG. 8, it may be seen that the all 1MB plus flowsapproach produces estimates that are significantly less than the activemeasurements. Taking the mean over the throughputs of all flow records(TI-F as shown in FIG. 8), produces estimates that are much closer, butare still generally less. This may be explained by the fact that someflows in the throughput index may still be rate-limited by applicationbehaviours that are not detected. It may also be explained by the factthat active measurement probes may be in higher quality vantage points(i.e. better radio frequency conditions) than most typical real users.

In one embodiment, the electronic data processor 106 may be configuredto compare the relative difference between the passive estimates andactive estimates in other regions. FIG. 9 features a comparison of therelative difference between each set of passive and active estimates forall regions, along with the ten regions with the most active probevantage points. As illustrated, the top of each bar in FIG. 9 may beconfigured to indicate the median relative difference (over all thehours) and the errors bars may show the 25^(th) and 75^(th) percentiles.Both methods of aggregation, TI-F and TI-T, are shown as having roughlythe same relative difference over all the regions and both have relativedifferences substantially less than the All 1MB plus flows approach.Additionally, it may be expected that when the active estimatesdecrease, the passive estimates may similarly decrease. FIG. 10illustrates a Pearson's correlation coefficient between each passiveestimate time series and the corresponding active estimate time seriesin all regions and in the top ten regions. The error bars illustrate 95%confidence intervals of the correlation coefficients and two perfectlycorrelated signals would have a correlation of one and any correlationgreater than 0.6 may be considered to be well correlated. FIG. 10illustrates that both methods for aggregation, TI-F and TI-T, are atleast as correlated with the active estimates as the all 1MB plus flowsestimates.

Thus, the electronic data processor 106 may be configured to calculatemaximum throughput for the data flows associated with computing devices102 in the communications network 104 by utilizing passively collectedflow records. Additionally, the electronic data processor 106effectively utilizes a throughput index to filter out rate-capped andpartially rate-limited flow types so as to provide estimations whichcorrelate with active measurements. In one embodiment, the electronicdata processor 106 may be further configured to adjust the predeterminedtime intervals used in collecting flow records and to estimate theaverage maximum throughput for each data flow having thenon-rate-limited flow type at the adjusted predetermined time interval.Furthermore, in another embodiment, any estimates, throughput indices,or other data generated or accessed by the electronic data processor 106may be stored in database 108.

Referring now also to FIG. 11, an exemplary method 1100 for passiveestimation of throughput in a network is depicted. The method 1100 mayinclude, at step 1102, collecting a flow record for each data flow froma plurality of data flows during a predetermined time interval. Thepredetermined time interval, for example, may be once every minute, onceevery five minutes, or any other desired time interval. Each data flowof the plurality of data flows may be associated with one or morecomputing devices, such as those utilized in the systems describedabove. At step 1104, the method 1100 may include annotating the flowrecord for each data flow with an application field and a contentprovider field. As noted above, the application field may indicate anapplication protocol and the content provider field may indicate acontent provider with which each data flow is communicating. Contents ofthe application field may be determined based on application headers andport numbers, and contents of the content provider field may be based ona header or a domain name service name of a server associated with thecontent provider. The flow records may be further annotated with a bytesfield, a duration field, a total bytes field, among other fields. Thebytes field may indicate a volume of data transferred during thepredetermined time interval. The duration field may indicate a timeinterval between first and last packets of each data flow, and the totalbytes field may indicate a volume of data transferred since each dataflow was initiated.

At step 1106, the method 1100 may include determining if the flow sizeof each data flow is large enough for the flow to achieve a steady-statethroughput. For example, the method may involve determine if enoughbytes were transferred in the flow to exit a slow-start phase duringtransmission of the flow. If it is determined that the flow size of thedata flow is not large enough to achieve steady-state throughput, themethod 1100 may include discarding or excluding the data flow from thepassive throughput estimations, at step 1108. However, in an embodiment,the method may include such data flows as well. If, however, it isdetermined that the flow size of the data flow is large enough toachieve steady-state throughput, the method 1100 may include determininga flow type of each data flow based on the annotated application fieldand the content provider field of the flow record, at step 1110. Howtypes may include rate-capped flow types, partially rate-limited flowtypes, non-rate-limited flow types, non-rate-capped flow types, andother flow types.

At step 1112, the method 1100 may include determining if the flow typeof the data flow is a non-rate-limited flow type or a non-rate-cappedflow type. If the flow type of the data flow is determined to be anon-rate-limited flow type or a non-rate-capped flow type, the method1100, at step 1114, may include generating a throughput index, which maybe configured to include the flow type of each data flow determined tohave either a non-rate-limited flow type or a non-rate-capped flow type.If, however, the flow type of the data flow is determined to be not anon-rate-limited flow type or not a non-rate-capped flow type (e.g.rate-capped flow type or partially rate-limited flow type), the method1100 may include rejecting the flow type from being included in thethroughput index at step 1116. At step 1118, the method 1100 may includeselecting each data flow that is determined to have the non-rate-limitedflow type or non-rate-capped flow type in the throughput index. Once thedata flows are selected, the method 1100 may include estimating anaverage maximum throughput for each data flow selected. In anembodiment, the estimations may be performed using any of the techniquesdescribed in the present disclosure.

In an embodiment, the method 1100 may include filtering out a flowrecord and/or flow type if an analysis of either the application fieldor the content provider field indicates a flow distribution that issimilar to a rate-capped flow type or a partially rate-limited flowtype. In another embodiment, the method 1100 may include validating theaverage maximum throughput estimated for each data flow by comparing theaverage maximum throughput estimated to a set of active measurementsmeasured in the network. For example, the estimates may be compared toactive measurements recorded by one or more probes positioned alongvarious locations in the network. Additionally, the method 1100 mayinclude determining the average maximum throughput for each data flowboth in the upload direction and the download direction. In oneembodiment, the method 1100 may include distinguishing each flow fromone the other by utilizing a tuple. As an illustration, the flows may bedistinguished by using a (ipsrc, ipdst, sport, dport) tuple or otherappropriate tuple. Furthermore, it is important to note that the methodsdescribed above may incorporate any of the functionality, devices,and/or features of the systems described above and are not intended tobe limited to the description provided above.

The methodology and techniques described with respect to the exemplaryembodiments can be performed using a machine or other computing devicewithin which a set of instructions, when executed, may cause the machineto perform any one or more of the methodologies discussed above. In someembodiments, the machine operates as a standalone device. In someembodiments, the machine may be connected (e.g., using a network) toother machines. In a networked deployment, the machine may operate inthe capacity of a server or a client user machine in server-client usernetwork environment, or as a peer machine in a peer-to-peer (ordistributed) network environment. The machine may comprise a servercomputer, a client user computer, a personal computer (PC), a tablet PC,a laptop computer, a desktop computer, a control system, a networkrouter, switch or bridge, or any machine capable of executing a set ofinstructions (sequential or otherwise) that specify actions to be takenby that machine. Further, while a single machine is illustrated, theterm “machine” shall also be taken to include any collection of machinesthat individually or jointly execute a set (or multiple sets) ofinstructions to perform any one or more of the methodologies discussedherein.

The machine may include a processor (e.g., a central processing unit(CPU), a graphics processing unit (GPU, or both), a main memory and astatic memory, which communicate with each other via a bus. The machinemay further include a video display unit (e.g., a liquid crystal display(LCD), a flat panel, a solid state display, or a cathode ray tube(CRT)). The machine may include an input device (e.g., a keyboard), acursor control device (e.g., a mouse), a disk drive unit, a signalgeneration device (e.g., a speaker or remote control) and a networkinterface device.

The disk drive unit may include a machine-readable medium on which isstored one or more sets of instructions (e.g., software) embodying anyone or more of the methodologies or functions described herein,including those methods illustrated above. The instructions may alsoreside, completely or at least partially, within the main memory, thestatic memory, and/or within the processor during execution thereof bythe machine. The main memory and the processor also may constitutemachine-readable media.

Dedicated hardware implementations including, but not limited to,application specific integrated circuits, programmable logic arrays andother hardware devices can likewise be constructed to implement themethods described herein. Applications that may include the apparatusand systems of various embodiments broadly include a variety ofelectronic and computer systems. Some embodiments implement functions intwo or more specific interconnected hardware modules or devices withrelated control and data signals communicated between and through themodules, or as portions of an application-specific integrated circuit.Thus, the example system is applicable to software, firmware, andhardware implementations.

In accordance with various embodiments of the present disclosure, themethods described herein are intended for operation as software programsrunning on a computer processor. Furthermore, software implementationscan include, but not limited to, distributed processing orcomponent/object distributed processing, parallel processing, or virtualmachine processing can also be constructed to implement the methodsdescribed herein.

The present disclosure contemplates a machine readable medium containinginstructions, or that which receives and executes instructions from apropagated signal so that a device connected to a network environmentcan send or receive voice, video or data, and to communicate over thenetwork using the instructions. The instructions may further betransmitted or received over a network via the network interface device.

While the machine-readable medium is shown in an example embodiment tobe a single medium, the term “machine-readable medium” should be takento include a single medium or multiple media (e.g., a centralized ordistributed database, and/or associated caches and servers) that storethe one or more sets of instructions. The term “machine-readable medium”shall also be taken to include any medium that is capable of storing,encoding or carrying a set of instructions for execution by the machineand that cause the machine to perform any one or more of themethodologies of the present disclosure.

The term “machine-readable medium” shall accordingly be taken toinclude, but not be limited to: solid-state memories such as a memorycard or other package that houses one or more read-only (non-volatile)memories, random access memories, or other re-writable (volatile)memories; magneto-optical or optical medium such as a disk or tape;non-transitory mediums or other self-contained information archive orset of archives is considered a distribution medium equivalent to atangible storage medium. Accordingly, the disclosure is considered toinclude any one or more of a machine-readable medium or a distributionmedium, as listed herein and including art-recognized equivalents andsuccessor media, in which the software implementations herein arestored.

Although the present specification describes components and functionsimplemented in the embodiments with reference to particular standardsand protocols, the disclosure is not limited to such standards andprotocols. Each of the standards for Internet and other packet switchednetwork transmission (e.g., TCP/IP, UDP/IP, HTML, HTTP) representexamples of the state of the art. Such standards are periodicallysuperseded by faster or more efficient equivalents having essentiallythe same functions. Accordingly, replacement standards and protocolshaving the same functions are considered equivalents.

The illustrations of arrangements described herein are intended toprovide a general understanding of the structure of various embodiments,and they are not intended to serve as a complete description of all theelements and features of apparatus and systems that might make use ofthe structures described herein. Many other arrangements will beapparent to those of skill in the art upon reviewing the abovedescription. Other arrangements may be utilized and derived therefrom,such that structural and logical substitutions and changes may be madewithout departing from the scope of this disclosure. Figures are alsomerely representational and may not be drawn to scale. Certainproportions thereof may be exaggerated, while others may be minimized.Accordingly, the specification and drawings are to be regarded in anillustrative rather than a restrictive sense.

Thus, although specific arrangements have been illustrated and describedherein, it should be appreciated that any arrangement calculated toachieve the same purpose may be substituted for the specific arrangementshown. This disclosure is intended to cover any and all adaptations orvariations of various embodiments and arrangements of the invention.Combinations of the above arrangements, and other arrangements notspecifically described herein, will be apparent to those of skill in theart upon reviewing the above description. Therefore, it is intended thatthe disclosure not be limited to the particular arrangement(s) disclosedas the best mode contemplated for carrying out this invention, but thatthe invention will include all embodiments and arrangements fallingwithin the scope of the appended claims.

We claim:
 1. A system for passive estimation of throughput, the system comprising: a memory that stores instructions; a processor that executes the instructions to perform operations, the operations comprising: annotating a flow record for each data flow of a plurality of data flows to include an application field and a content provider field, wherein the application field indicates an application protocol and the content provider field indicates a content provider with which each data flow is in communication; determining a flow type of each data flow based on the application field and the content provider field of the flow record; selecting each data flow for which the flow type is determined to have a non-rate-limited flow type; and estimating an average maximum throughput for each data flow selected.
 2. The system of claim 1, wherein the operations further comprise generating a throughput index, wherein the throughput index comprises the flow type determined for each data flow if the flow type is determined to have the non-rate-limited flow type.
 3. The system of claim 2, wherein the operations further comprise selecting, from the throughput index, each data flow for which the flow type is determined to have the non-rate-limited flow type.
 4. The system of claim 2, wherein the operations further comprise rejecting, from the throughput index, each data flow determined to not have the non-rate-limited flow type.
 5. The system of claim 1, wherein the operations further comprise determining if each data flow of the plurality of data flows has a flow size for achieving a steady-state throughput.
 6. The system of claim 5, wherein the operations further comprise excluding each data flow determined not to have the flow size for achieving the steady-state throughput when estimating the average maximum throughput.
 7. The system of claim 1, wherein the flow type is selected from the group comprising a rate-capped flow type, a partially rate-limited flow type, and the non-rate-limited flow type.
 8. The system of claim 1, wherein the operations further comprise validating the average maximum throughput estimated for each data flow selected by comparing the average maximum throughput to a set of active measurements measured in a network.
 9. The system of claim 1, wherein the operations further comprise recording the set of active measurements measured in the network by utilizing probes positioned along various locations in the network.
 10. The system of claim 1, wherein the operations further comprise accessing the flow record for each data flow of the plurality of data flows during a predetermined time interval.
 11. The system of claim 10, wherein the operations further comprise adjusting the predetermined time interval, and wherein the operations further comprise estimating the average maximum throughput for each data flow selected based on the adjusted predetermined time interval.
 12. A method for passive estimation of throughput, the method comprising: annotating a flow record for each data flow of a plurality of data flows to include an application field and a content provider field, wherein the application field indicates an application protocol and the content provider field indicates a content provider with which each data flow is in communication; determining a flow type of each data flow based on the application field and the content provider field of the flow record; selecting each data flow for which the flow type is determined to have a non-rate-limited flow type; and estimating, by utilizing instructions from memory that are executed by a processor, an average maximum throughput for each data flow selected.
 13. The method of claim 12, further comprising validating the average maximum throughput estimated for each data flow selected by comparing the average maximum throughput to a set of active measurements measured in a network.
 14. The method of claim 13, further comprising recording the set of active measurements measured in the network by utilizing probes positioned along various locations in the network.
 15. The method of claim 12, further comprising determining if each data flow of the plurality of data flows has a flow size for achieving a steady-state throughput by determining if enough bytes were transmitted in each data flow to exit a slow-start phase during transmission of each data flow.
 16. The method of claim 15, further comprising excluding each data flow determined not to have the flow size for achieving the steady-state throughput when estimating the average maximum throughput.
 17. The method of claim 12, further comprising generating a throughput index, wherein the throughput index comprises the flow type determined for each data flow if the flow type is determined to have the non-rate-limited flow type.
 18. The method of claim 17, further comprising rejecting, from the throughput index, each data flow determined to not have the non-rate-limited flow type.
 19. The method of claim 12, further comprising storing the flow record for each data flow of the plurality of data flows during a predetermined time interval.
 20. A computer-readable device comprising instructions, which, when loaded and executed by a processor, cause the process to perform operations comprising: annotating a flow record for each data flow of a plurality of data flows to include an application field and a content provider field, wherein the application field indicates an application protocol and the content provider field indicates a content provider with which each data flow is in communication; determining a flow type of each data flow based on the application field and the content provider field of the flow record; selecting each data flow for which the flow type is determined to have a non-rate-limited flow type; and estimating an average maximum throughput for each data flow selected. 